Detect Labels, Faces, and Landmarks in Images with the Cloud Vision API

第 12 屆 iThome 鐵人賽

DAY 6

AI & Data

Machine Learning Study Jam 2020系列第 6 篇

12th鐵人賽 ml study jam 2020 vision api

Only Live Once

團隊對不起，你是個好人，但我們只是網友

2020-09-19 22:33:50

1206 瀏覽

分享至

An old saying - images can be explained more than thoudsands of words

Human beings are more attracted by visualisation than words.
Research shows images are easily be memorised in brain than text.

No doubts, you love to watch videos than reading right XD

Finally we are proceeding to a more interesting topic, DETECTING IMAGES!!!

Waiting no more! Let's start~

Open Google Cloud Platform ( follow the step in A Tour of Qwiklabs and Google Cloud )
Activate Cloud Shell
Like what we did in the previous lesson.
Create an API Key
Like what we did in the previous lesson.
Upload an Image to a Cloud Storage bucket
Go Storage in Google Cloud Platform and give your bucket a globally unique name.
We use this image as a sample. donuts.png

Once this image is uploaded, set the permission to PUBLIC. So we'll now see that the file has public access.

Create Vision API request
Create a request.json file and add the following code:

{
  "requests": [
      {
        "image": {
          "source": {
              "gcsImageUri": "gs://my-bucket-name/donuts.png"
          }
        },
        "features": [
          {
            "type": "LABEL_DETECTION",
            "maxResults": 10
          }
        ]
      }
  ]
}

Label Detection
Call the Vision API:

curl -s -X POST -H "Content-Type: application/json" --data-binary @request.json  https://vision.googleapis.com/v1/images:annotate?key=${API_KEY}

It returns a list of labels (words) of what's in the image like this:

description -> name of the item.
score -> a number from 0 - 1 indicating how confident it is that the description matches what's in the image.
mid -> value that maps to the item's mid in Google's Knowledge Graph. You can use the mid when calling the Knowledge Graph API to get more information on the item.

Web Detection
Vision API can also search the Internet for additional details on your image. Through the API's webDetection method, you get a lot of interesting data back:

A list of entities found in your image, based on content from pages with similar images
URLs of exact and partial matching images found across the web, along with the URLs of those pages
URLs of similar images, like doing a reverse image search

We use the same image donuts.png for web detection.

This time we edit the same request.json as following:

{
  "requests": [
      {
        "image": {
          "source": {
              "gcsImageUri": "gs://my-bucket-name/donuts.png"
          }
        },
        "features": [
          {
            "type": "WEB_DETECTION",
            "maxResults": 10
          }
        ]
      }
  ]
}

You may notice type is changing from LABEL_DETECTION to WEB_DETECTION.

Then just use the same command line as above:

curl -s -X POST -H "Content-Type: application/json" --data-binary @request.json  https://vision.googleapis.com/v1/images:annotate?key=${API_KEY}

Entities of this image will be listed under webEntities:

If you scroll further down of the result, you will see urls which give similiar images of the detected image under visuallySimilarImages:

Face Detection
Face detection method returns data on faces found in an image, including the emotions of the faces and their location in the image.

Sounds even more magical right!!!

Let's upload another image for face detection. selfie.png

Once this image is uploaded, set the permission to PUBLIC. So we'll now see that the file has public access.

Edit the same request.json to the following code:

{
  "requests": [
      {
        "image": {
          "source": {
              "gcsImageUri": "gs://my-bucket-name/selfie.png"
          }
        },
        "features": [
          {
            "type": "FACE_DETECTION"
          },
          {
            "type": "LANDMARK_DETECTION"
          }
        ]
      }
  ]
}

Notice we have 2 types here: FACE_DETECTION and LANDMARK_DETECTION

Use the same command line to call Vision API:

curl -s -X POST -H "Content-Type: application/json" --data-binary @request.json  https://vision.googleapis.com/v1/images:annotate?key=${API_KEY}

API returns an object for each face found in the image. Take a look at faceAnnotations object in the result:

boundingPoly -> the x,y coordinates around the face in the image.
fdBoundingPoly -> a smaller box than boundingPoly, focusing on the skin part of the face.
landmarks -> an array of objects for each facial feature, some you may not have even known about. This tells us the type of landmark, along with the 3D position of that feature (x,y,z coordinates) where the z coordinate is the depth.

Landmark Annotation
Landmark detection can identify common (and obscure) landmarks. It returns the name of the landmark, its latitude and longitude coordinates, and the location of where the landmark was identified in an image.

Wow unbelievable right!!!

Let's upload our last image for landmark detection. city.png

Once this image is uploaded, set the permission to PUBLIC. So we'll now see that the file has public access.

Use one last time for the same command line:

curl -s -X POST -H "Content-Type: application/json" --data-binary @request.json  <https://vision.googleapis.com/v1/images:annotate?key=${API_KEY}>

The result tells us this image is taken in Boston with exact location:

boundingPoly -> region in the image where the landmark was identified.

Explore other Vision API methods

Logo detection -> identify common logos and their location in an image.
Safe search detection -> determine whether or not an image contains explicit content. This is useful for any application with user-generated content. You can filter images based on four factors: adult, medical, violent, and spoof content.
Text detection -> run OCR to extract text from images. This method can even identify the language of text present in an image.

Vision API does lots of amazing works!
Different type shows different results. What a convenient work!